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I . 



INTRODUCTION 



A. BACKGROUND 

In March of 1972, the General Accounting Office sent a 
preliminary report to Congress dealing with the acquisition 
of major weapon systems [Ref. l:p. 1] . The GAO reported 
that the Navy had experienced a cost growth of $19 billion 
on twenty-four weapon systems in FY 1971, of which 15 
percent was attributed to poor cost estimation. Inaccurate 
cost estimates for weapon systems can result in program 
delays, cost overruns, acquisition of systems that are not 
the most cost effective, and a lack of taxpayer confidence 
in military leaders, to name only a few of the consequences . 
Congressional concern and a continuing need for better 
planning estimates have made it imperative that new 
techniques be developed and old methods be improved to 
obtain better cost estimates for major weapon system 
production and acquisition [Ref. 2:p. 1]. In the area of 

cost estimation, an old technique that continues to be a 
significant tool is the learning curve. 

The first study addressing the learning curve phenomenon 
was documented by the pioneer of the learning curve, T. P. 
Wright of the Cur ti ss-Wr ight Corporation, in his 1936 paper, 
"Factors Affecting the Cost of Airplanes" [Ref. 3:p. 32]. 

Analysis of the data collected for a number of years 
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beginning in 1922 concerned the relationship of production 
quantity with cost as measured in direct labor hours. 
Wright claimed that each time the cumulative production 
quantity doubled, the average unit cost for that quantity 
decreased by a constant amount, and that this relationship 
plotted as a straight line on logarithmic paper. Wright's 
formulation of the learning curve was: 



Y 



c 



aX 



b 



where 



X: cumulative production quantity 

Y c : average cost per unit 

b: factor of cost variation 

a: direct manhour cost for production unit number one 

Based on most of the literature available, it can safely 
be said that the principal factors contributing to the 
existence of this learning phenomenon include considerably 
more than just operator learning. Conway and Schultz 
[Ref. 4:p. 42] believe that learning in aircraft production 

is influenced by a number of dur ing-production factors 
including: 



1 ) 

2 ) 



3 ) 



incentive pay 
changes in tooling 
design changes 
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4) management learning 

5) volume changes 

6) quality improvements 



The rate of a learning curve is usually described by the 
complement of the reduction achieved when the production 
quantity is doubled. This value is usually called the slope 
of the curve and is found: 



s = y 2x /y x 



= (2X) b /X b 



= 2 



where 






b: 


slope of 


learning curve 


S: 


fraction 


to which the co 




quantity 


doubles 



Wright believed that the cumulative average learning 
phenomenon plotted linearly on logarithmic scales and the 
unit learning curve formulation derived from this cumulative 
equation would be [Ref. 5:p. 266]: 



Y c - ax 



V T = Y c • x 



= ax 



b+1 
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So 



Y x = a (X 



b+1 



(x - 1) 



b+1 



a (b + 1)X 



b 



as X -*■ ® 



where 



Y : average cost per unit 

c 

Y t : total cumulative cost 

Yy . : cost of the Xth unit 

a ,b : parameters of the formulation 

J. R. Crawford, another major contributor to the 
literature and theory of learning curves, disagreed with 
T. P. Wright in the log-linear formulation of the cumulative 
average learning curve [Ref. 6:p. 21]. His disagreement was 
based on the apparently steep slope between early production 
units of the unit learning curve derived from the cumulative 
curve. In Crawford's studies, he described the learning 
phenomenon in what has been termed the unit learning curve: 



Y 



aX 



b 



X 



where 



Y^: cost of the Xth unit 



X: cumulative amount of units produced 



a 



manhour cost for the first production unit 



b: factor of cost variation 
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The cumulative average cost curve derived from the unit 
curve is [Ref. 6:p. 21]: 

Y x = aX b 

n 

Y t = a V X b 

X = 1 
X 

Y c = (a S xb >/ X 
X=1 

= (a/ (1 + b) )X b as X - « 

where 

: cost of the Xth unit 

Y t : total cumulative cost 

Y c : average cost per unit produced 

a,b: formulation parameters 

For years both the unit learning curve and the 
cumulative average learning curve have been used almost 
interchangeably. Womer and Patterson [Ref. 5:p. 266] show 

and conclude this is so because for large values of X, each 
curve is a good approximation for the other. They go on to 
say that a problem arises, however, since learning curves 
are generally formulated on the first few units of output to 
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forecast the cost of an entire production. Even though 
forecasts may be for large values of X, the data used to 
make them are not. Under these circumstances, the estimated 
cumulative average learning curve, for example, may approach 
a unit learning curve, but not necessarily the same unit 
curve that would be approximated from early units. Which 
log-linear learning curve specification to choose, unit or 
cumulative, had, through the years, presented a source for 
inaccurate cost estimation. Although 93 percent of all 
firms utilize Crawford's unit learning curve [Ref. 7:p. 23] , 
there are sufficient exceptions to the use of this unit 
curve implying experience seems to be the best method for 
choosing a particular model. 

Following World War II, Gardner Carr of the McDonnell 
Aircraft Corporation felt learning curves being represented 
as linear on logarithmic paper was an inaccurate portrayal 
of the learning phenomenom. In his April 1946 article 
[Ref. 8:p. 77], Carr felt that the straight line was 

adequate for overall project statistics but is rarely 
correct for budget or actual cost finding purposes. He 
believed that the cumulative average learning curve was 
S-shaped on the logarithimic scale. Explanations for the 
various segment shapes of this curve are found in a RAND 
report by Asher, "Cost Quantity Relationships in the 
Airframe Industry" [Ref. 6:p. 28]. 
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Another study which suggested that learning curves do 
not adhere to log-linearity was conducted by the Stanford 



Research Institute following World War II. The Stanford 
system utilizes the 'B-factor' which, basically, modifies 



Y: cost per unit in manhours 

a: theoretical first unit cost 

X: cumulative quantity produced 

B: modification factor 

The effect of this formulation is a concave curve on the 
logarithmic scale. The cost of the first unit is depressed 
and the curve arcs to the standard learning curve [Ref. 7: 

p. 8] . 

Further research that deviated from the log- 1 i near i ty 
hypothesis was conducted. Another perspective of the 
production process is that various departments contribute to 
the overall quantity of direct labor hours. Generally 
speaking, these departments are fabrication, subassembly, 
major, and final assembly. It seems obvious that each 
department contributing to the learning curve would itself 
have its own learning curve. In order for the various 



the standard learning curve for prior experience. The 



formulation of this learning curve is: 




where 
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departments to have their learning effects sum to an overall 

production process log-linear learning curve, each of the 

department slopes must be identical. In practice, the 

various departments often have different slopes. Summing 

these curves would result in a departure from log-linearity 

and arrive at a convex curve whose slope is bounded by the 

flattest of the component curves. In "Cost Quantity 

Relationships in the Airframe Industry" [Ref. 6:p. 69], 

Asher uses this argument while conducting a significant 

analysis disputing the log-linear hypothesis of the 

formulation of the learning curve. In his report, he also 

cites research done previously by P. B. Crouse, G. M. 

Giannini, and P. Guibert supporting his contentions. Asher 

concludes, however, that his study 

. . . does not discredit the use of the linear progress 
curve .... The linear curve is useful for making 
extrapolations beyond the data range provided the number 
of additional units is small. It is clearly a matter of 
judgement whether or not in a specific instance the linear 
curve is appropriate .... If allowable error is 
relatively small, a convex curve resulting from predicting 
each of the component curves separately is probably more 
appropr i ate . 

Another approach to research in the theory of learning 
curves has involved the inclusion of production rate as an 
explanatory variable in learning curve models. In Alchian's 
1963 article [Ref. 9:p. 679] , he cites work done in 1948 
that concluded production rate is not a relevant variable. 
Whereas as results published by Smith [Ref. 10:p. 138], and 
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supported by Kinton and Congelton [Ref. ll:p. 92], concluded 
that production rate plays a significant role in explaining 
the effects of learning, other studies with contradictory 
results exist. Womer and Gulledge have produced a consider- 
able literature discussing the effects of production rate 
which resulted in a final report for the Air Force [Ref. 12: 
p. 5] addressing the contradictory results of previous 
research, and they develop a cost function including 
production rate and the cos t -qu an t i t y relationship of 
learning curve theory. 

In his article "The Learning Curve: Historical Review 

and Comprehensive Study" [Ref. 13:p. 302], Yelle states that 
most of the literature in learning curve theory, from its 
inception through the 1960's, has focused on primarily 
military applications in the early years through World War 
II and on industry and business in the more recent years.. 
Through the years and various paths that research in this 
area has followed, most of the studies do not reach 
consistent conclusions. The early goals of developing a 
general formulation of the learning curve that could be 
applied to the entire aircraft industry or subsets of it 
were quickly abandoned. Despite the vast amounts of 
literature disputing the log-linear relationship between 
cost and cumulative quantity produced, the unit learning 
curve is still the most widely used formulation of the 
learning curve used in cost estimation today [Ref. 7:p. 7] . 
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B. OBJECTIVES 



The preceding pages and references provide a brief 
summary of the research expended on the theory of the 
learning curve over the past half century. The important 
point is the learning phenomenon and the numerous formu- 
lations of this theory in aircraft and other industries has 
been an area of extensive research and continues to be a 
viable tool in the world of production economics. 

The purpose of this research is to conduct an empirical 
study of still another theoretical reformulation of the 
learning curve. In "Budgets, Contracts, Incentives and 
Costs: A Stylized Nexus", by Boger, Jones and Sontheimer 

[Ref. 14:p. 23], the cumulative average learning curve is 

reformulated to examine the influence cost forecasting and 
budget formation have on the incentives bearing on the firm 
for cost control. The model developed by Boger et . al . , a 
cumulative average learning curve model, and a unit learning 
curve model will be estimated through simple linear and non- 
linear regression techniques using several sets of aircraft 
production data. For each formulation of the learning 
curve, the models resulting from the two fitting techniques 
will be analyzed, validated, and compared. Finally, the 
Boger et . al . model will be compared with the classical 
learning curve models for empirical validation. 
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THE MODELS 



II . 

A. CUMULATIVE AVERAGE LEARNING CURVE 

The cumulative average learning curve, as discussed 
above, was first formulated by T. P. Wright in the 1930's. 
The log-linear relationship between cumulative production 
quantity and average cost per unit is: 



Y 



c 



aX 



b 



where 

X: cumulative production quantity 

Y c : average cost per unit 

b: factor of cost variation 

a: direct manhour cost for first unit 

The cumulative production quantity is usually expressed 
as an integer number of units produced. The cost variable 
is measured in direct manhours expended in the production of 
the cumulative quantity produced. We expect the learning 
curve slope, factor of cost variation, to have a negative 
value when we anticipate the presence of learning in the 
production of some product. This formulation also 
presupposes a relatively constant rate of production and 
uniformity of units produced. Deviations from these last 
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assumptions are recognizable in a plot of the raw data, 
i.e., toe up, toe down, bottom out, scallop. 



B. UNIT LEARNING CURVE 

The unit learning curve, as also discussed above, was 
first formulated by J. R. Crawford. He disagreed with 
Wright's log-linear formulation of the cumulative average 
learning curve. Crawford believed the relationship between 
cumulative quantity produced and the cost of the final unit 
of that quantity was log-linear and was formulated as: 



Y 



X 



aX 



b 



where 

Y^: cost of the final unit 

X: cumulative quantity produced 

a: direct manhour cost for first unit 

b: factor of cost variation 

The same comments and assumptions concerning the cumulative 
average learning curve apply. 

C. BOGER, JONES, AND SONTHEIMER MODEL 

Boger , Jones, and Sontheimer express the costs of 
production over a time period as opposed to over the 
production of cumulative units regardless of time. They use 
the cumulative average learning curve as the starting point 
in their formulation. 
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As discussed above, the typical cumulative average 
learning curve is of the form: 

Y(t) = aQ (t) b (1) 



where now 

Y(t): average cost per unit 

Q(t): cumulative quantity of units produced through 

time t 

a ,b : learning curve parameters 

The typical progress function (learning curve) treats the 
inputs as varying continuously and causing a related 
continuous variation in some product (output) [Ref. 14: 
p. 23], From (1) we can derive an expression for total 
cost : 

Q ( t) • Y ( t) = aQ (t) b Q ( t) 

X ( t) = aQ (t) b+1 (2) 

where 

X(t): total quantity of inputs consumed by the production 

of Q ( t) 

This specification yields the following marginal require- 
ments, dX, for an incremented output, dQ: 

^ = a (b + 1 ) Q b (3) 
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Now, assume the product emerges in quantities at discrete 
time intervals. That is, we now develop an algorithm using 
the cumulative average learning curve formulation based on 
how many units are produced in a specified time period. In 
application, we assume that progress or cost per quantity is 
proportional to productivity achieved in prior production: 



X 



t 



X 

q 



t-i 

t-i 



q t 



where 



(4) 



q t = dQ: 
X t = dX: 




We assume 
pr eced ing 
period we 



amount produced in time period t 
inputs used in time period t 
proportionality constant 

that learning is derived not only from 
period but from all the production prior to 
are in. So we first set: 



the 

the 



X t _ dX 

q t d< 2 



a (b + 1 ) Q b 



where 

Q = Q(t) 

Substituting (4) we get: 



6 t q t /q t = a(b + 1 ) Q b 

q t-l 



x 

« t 77 ^ q t = a(b + 1 ) Q b q. 
q t-l 



(5) 
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We now let Q, the quantity of units produced up to time t, 
be equal to the quantity of units produced through time 
period t-1. Now, substituting into (5): 



s t 5TT q t ■ a < b * 11 i E <3j lb <J t <6) 

j-1 

Equation (6) assumes learning in period t is derived only 
from production in period t-1. We assume this relationship 
must hold at previous time periods also. So rewriting (4) 
and (5) for period t-1. 



X. . = i . - 
t-1 t-1 q 



t-2 



t-2 



q t _l = a(b + 1) Q* q t . 1 



where 

Q* : amount of units produced through time period t-2 

Therefore , 



t-2 

X t-1 = a(b + 11 1 £ <3j lb ^t-l 

j-i 

which leads to: 



X t-1 

= a (b + 1) 

q t-l 






j-1 



b 
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and substituting into (6): 



t-2 t-1 

, b > r ,b 



1 1 a ( b + 1)[ J^qj] q t = a(b + 1)[ ^ q j ] b q t 

j=l j=l 



t-1 

E 

1=1 



t-2 



E o 



3 



for t = 3, 4, 5, .../ T 



L j = l 

Now substituting (7) into (4) we have: 



_ t-1 



X, 



E 


q j 


x t-l 


j = l 


t-2 

E 


q j 


q t-l 


j=i 







X 



t 

<*t 



t-1 

E 



j = l 



t-2 



E <». 



j = l 



x 



t-i 



't-l 



Since this is true for all time periods, we can say; 



(7) 
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t-2 



't-1 



E q< 

43 - 

E ^ 

j=l 



‘t-2 

! t-2 



't-2 

*t-2 



t-3 

E q- 

j = l 

t-4 

E q- 

j = l 



't-3 

*t-3 



and so on. 



So, substituting recursively we have 



t-1 t-2 t-3 

[ £ q j ]b [ X q ji b i E q ji b 

: 1=1 1=1 j=l 



^t 



t-2 



t-3 



t-4 



[ X q j ]b [ X q j ]b C X q j ] 



j = l 



j = l 



j = l 



t x< 

j=l 



_2 

q 2 



X. 



t-1 

E « 3 

1=1 



q l 



_1 
q 2 



_ t-1 b 



= Z 



E q- 

3 = 1 
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where 



direct manhours per quantity produced in second 
time period 



z : 



t-1 




total quantity of units produced prior to present 
time period 



q^ : quantity of units produced on time period one 

b: factor of cost variation 

X t /q fc : average cost in direct manhours of units produced 

in time period t 

The length of the time period, although it must remain fixed 
over the data space, can be any length, i.e., day, month, or 
quarter. The quantity produced in a particular time period 
need not be an integer amount although partial units 
produced are generally not found in aircraft production 
data. As in the cumulative average and unit learning curve 
formulations, we expect the factor of cost variation to have 
a negative value. This model also presupposes uniformity 
between production units and also a constant production 
r ate . 
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Ill . 



DATA 



A. GENERAL 

The dependent variable in each of the models investi- 
gated will involve a cost of some type. In each of our 
models this cost will be measured as a function of direct 
manhours expended in the production of some quantity of 
units. Direct manhours will be defined as those hours spent 
on fabrication, assembly, production flight, and other 
production work associated with the basic aircraft. All 
manhours pertaining to tooling, engineering, planning, 
testing and subcontracting are not included in this 
definition. It seems obvious that the way in which direct 
manhours are accumulated can, and does, lead to inconsis- 
tencies due to differences in accounting systems from 
contractor to contractor. The use of direct manhours has 
numerous advantages over the use of dollars as a measure of 
cost. In using direct manhours, we avoid the additional 
data computations involved in applying price indices to 
transform all dollar costs into constant dollars. We also 
avoid inaccuracies in the data caused by using price indices 
which are inexact figures. Finally, direct manhours is a 
variable comparable over a group of contractors whereas, due 
to differences in wage rates from contractor to contractor. 
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costs measured in dollars are not the best tool for 



compar i son . 

The data for this report include aircraft production 
data for the C-141 and F-102. The C-141 was produced by the 
Lockheed Corporation and the F-102 was produced by General 
Dynamics. The C-141 program produced 284 aircraft from July 
1962 through April 1968. The C-141 is a large, swept wing, 
4 jet engine cargo transport. The data for this study were 
drawn from Orsini [Ref. 15:p. 104]. Orsini obtained the 

data from C-141 Financial Management Reports prepared by the 
contractor, Lockheed Aircraft Corporation, for the Air 
Force. The C-141 data provided a large sample of data for 
which a basic model of the aircraft was produced throughout 
the production program. Uniformity between units produced 
is a basic assumption in the application of the learning 



curve 


theory . 
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The F-102 program produced 1000 aircraft from 1953 



through 1958. The F-102 is a single seat, supersonic, delta 
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wing, all-weather fighter. The data for this study was 
drawn from Gulledge and Womer [Ref. 12:p. 73]. A 

comprehensive cost breakdown by individual airframe was 
provided by the F-102 Program Cost History" document-- the 

source of the Womer and Gulledge data. The F-102 program 
consisted of the production of F-102 airframes and TF-102 
airframes. Rather than delete the TF-102 observations for 

the sake of strict uniformity, these data points were not 

eliminated since it was assumed that learning was 
experienced in the production of these airframes. As Womer 
and Gulledge note, the total manhours expended per airframe 
can be disaggregated into three parts: details, assemblies, 

and ou t s id e-o f - f ac to r y labor. Total direct cost per 
airframe is comprised of only detail and assembly hours. 

The detail hours are comprised of fabrication hours and 
assembly hours include subassembly, major assembly, primary 
assembly, and final assembly hours. After the portion of 
labor hours expended per airframe outside the factory is 
deleted, the total direct cost per airframe is left. 

B. REFINEMENT 

As already discussed, three models will be utilized in 
the examination of two sets of aircraft production data. 
Parameter estimation for these models require the data to be 
in a particular form for each model. The C-141 production 
data is available for aircraft grouped into production lots 
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and the F-102 production data is available for each 
airframe. Since the models do not each fit the particular 
form of each data set, adjustments and refinements need to 
be made to the data to fit the different learning curve 
f ormul at i ons . 

1 . Cumulative Average Learning Curve 

The data requirements for the cumulative average 
learning curve are rather straightforward. The independent 
variable is the cumulative quantity of aircraft produced. 
The dependent variable is the average amount of direct labor 
hours expended per unit in the production of the cumulative 
quantity produced. The F-102 and C-141 adjusted data used 
to fit the cumulative average learning curve are tabulated 
in Appendix A. 

The composition of the F-102 data consist basically 
of total hours expended in the production of each airframe. 
This data set lends itself to be easily refined to meet the 
data requirements of the cumulative average learning. As 
previously discussed, the F-102 total direct manhours per 
aircraft consisted of three parts: details, assemblies, and 
outside of factory labor. Table I, extracted from Womer and 
Gulledge [Ref. 12:p. 86], provided the information necessary 
to translate the raw data into direct manhours per airframe. 
Since this table only applied to lots four through eleven, 
only these 204 observations were utilized. The .airframes in 
lots four through eleven were then ordered with respect to 
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TABLE I 

PERCENT OF TOTAL MANHOURS ALLOCATED TO 
SPECIFIC ACTIVITIES BY CONTRACT 
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33965 
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delivery sequence number. It was this sequence--l, 2, 3, 

..., 204 — that provided the independent variable data 

vector. The sequence of cumulative sums of direct manhours 
divided by the cumulative amount of airframes delivered for 
each element of that sequence provided the dependent 
variable data vector. 

The C-141 data were organized into twelve lots. The 
number of units in each lot and the number of direct man- 
hours expended in the production of each lot of airframes is 
provided. The data required for the cumulative average 
learning curve is arrived at through a series of simple 
calculations discussed in the RAND Memorandum "An Intro- 
duction to Equipment Cost Estimating" [Ref. 16:p. 104]. The 
cumulative average hours are computed at the final unit in 
each lot--where the cumulative average hour figures apply. 
Therefore, twelve data points will be used in the parameter 
estimation for the C-141 cumulative average learning curve 
formulation . 

2 . Unit Learning Curve 

The data requirements for the unit learning curve 
are also rather straightforward. The independent variable 
is the cumulative quantity of aircraft produced. The 
dependent variable is the amount of direct manhours expended 
in the production of the final unit of the cumulative 
quantity produced. The F-102 and C-141 adjusted data used 
to fit the unit learning curve are tabulated in Appendix B. 
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The composition of the F-102 data again tends to be 
easily refined to meet the data requirements of the unit 
learning curve. Table I is used to translate the raw data 
of lots four through eleven into direct manhours per 
airframe. The airframes were then ordered with respect to 
delivery sequence number. It was this sequence of 204 
airframes with each unit's respective direct labor hours 
required for production that are used as the independent and 
dependent variable data vectors for the estimation of the 
parameters of the unit learning curve. 

Since the C-141 production data are grouped into 
lots, a rather gross approximating technique is required to 
transform the data into the form required by the unit 
learning curve specification. The average number of labor 
hours for each lot is treated as if it were an observation 
on the labor hours required to produce the unit at the lot 
midpoint. When dealing with a log-linear relationship, the 
arithmetic midpoint produces unequal areas under the 
learning curve between the first and last units of each 
respective lot. The exact determination of a true lot 
midpoint depends on the lot quantity, type of curve hypothe- 
sized, and the true slope of the learning curve [Ref. 16: 
p. 105]. In order to avoid the shortcomings of the 

arithmetic midpoint, the algebraic midpoint, K, discussed in 
[Ref. 17:p. 44] will be used: 
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K 



-1/B 



m ( 1 + B) 

(L + .5) (1 + B) - (F - .5) (1 + B) 

m: lot quantity 

B: learning curve slope 

L: last unit of the lot 

F: first unit of the lot 

An estimate of B from Womer and Patterson's report 
[Ref. 5:p. 267] , is used in calculating the algebraic 

midpoint. Again, twelve data points are used in the 

parameter estimation for the C-141 unit learning curve 
speci fications. 

3 . Boger, Jones, and Sontheimer Model 

The data requirements for this model are based on 
the statement regarding the marginal requirements for 
incremental outputs of product produced in Boger, Jones, and 
Sontheimer' s paper [Ref. 14:p. 23]. That is, the product 

emerges in lots or lumps, q fc , at discrete intervals using 
discrete inputs, , of the composite resource (direct labor 
hours) . Therefore, the data requirements for this model 
are: quantity of units produced each time period and the 

direct labor hours expended in the production of units 
produced in each time period. 

The complete data base for the F-102 program 
contains total labor hours for each airframe. This data is 
not in the form required for the Boger et. al . model. Womer 
and Gulledge took considerable care in resolving the data 





34 



problem in their study [Ref. 12:p. 85]. Their work made the 
data compatible with the theoretical model they were 
testing. The information concerning the F-102 program that 
Womer and Gulledge discuss made it possible to apply some 
further adjustments to establish a data base compatible with 
the Boger et . al . model. 

As discussed before, the ideal data for the Boger 
et . al . model is the total number of aircraft produced in a 
specific time period, q t , and the quantity of direct labor 
hours, X^_ , expended in producing q^.. Although this data is 
not directly available, Womer and Gulledge derived the next 
best alternative — cost by lot per month. Due to non- 
availability of certain information, Womer and Gulledge only 
were able to approximate the cost by lot per month for lots 
four through eleven. 

Tables I, II, and III along with the F-102 data base 
in [Ref. 12:pp. 83-85] provided enough information to adjust 
the data for lots four through eleven for use in the Boger 
et . al . model. The first adjustment was to use Table I and 
the total labor hours expended on each airframe in lots four 
through eleven to arrive at values for cumulative fabrica- 
tion and assembly hours for each airframe. As discussed 
earlier, these hours comprise the direct labor hours 
expended for each airframe. The next step was to calculate 
the equivalent airframe units produced per month for each 
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-102 PERCENT ASSEMBLY HOURS COMPLETED 
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lot . 



This was calculated by first determining the empirical 



production rates for each lot: 



Y 



f 



L DMH f 

aircraft 

in lot 

airframes in lot 



for lots 4, 5, 6, 



11 



Y 



a 



DMH 

a 

airframes 

in lot 

airframes in lot 




for lots 4, 5, 6, 



11 



Production rate (fab) = 1/Y^ 

Production rate (assem) = 1/Y 

a 

DMI^: direct manhours for fabrication 
DMH : direct manhours for assembly 

3 

The production rates for fabrication and assembly were then 
applied in conjunction with Tables II and III to the 
cumulative fabrication and assembly hours per month per lot, 
then added to arrive at equivalent aircraft produced per 
month per lot. These results were then summed across lots 
four through eleven for each month appropriately using 
Tables II and III to arrive at equivalent units produced per 
month. Direct labor hours expended per month on the 
equivalent quantity of airframes produced per month was 
similarly calculated. The adjusted F-102 production data 
per month for lots four through eleven for use in the Boger 
et . al . model is summarized in Appendix C. 
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The original form of the C-141 data made available 
to Orsini by the Air Force Plant Representative Office was 
direct manhours per lot per month expended as direct labor 
hours as defined previously and the quantity of aircraft per 
lot. Orsini then aggregated this monthly data into 
quarterly data points and tabulated it as direct manhours 
per lot per quarter. The adjustments made to the data by 
Orsini for his analysis were compatible with the refinements 
required by the Boger et . al. model. Average production 
rate for each lot was first determined by dividing total 
aircraft in each lot by the total amount of direct labor 
hours attributed to the production of each respective lot. 
This average production rate was then applied to the 
tabulated quarterly data to arrive at equivalent units 
produced per lot per quarter. The equivalent units produced 
per lot per quarter and direct labor labor hours per quarter 
were then summed across each lot for the quarters each lot 
was worked on to arrive at equivalent units produced per 
quarter and direct labor hours expended per quarter. The 
data, as refined by Orsini, used in the Boger et . al . model 
is tabulated in Appendix C. 
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IV. METHODOLOGY 



A. LINEAR REGRESSION 

Historically, it has usually been assumed that the 
relationship between the independent and dependent variables 
of a learning curve specification is log-linear. This 
assumption has made it particularly easy to estimate the 
learning curve parameters through simple linear regression 
when only one independent variable is used. In this study, 
the least squares, normal error regression, model is 
utilized. The normal error model is: 

Y i = 0 Q + e l x i + G i for i = 1/ 2, 3, ... 

where 

1 1*1 

Y^ : observed response of the l trial 

t h 

X i : the level of the independent variable in i u trial 

00 , 8 ^: regression parameters 

2 

e ^ : residuals which are distributed N(0, a ) 

Normality of the error terms seems reasonable since the 
residuals probably represent the accumulation of many 
effects that are omitted from the model. The cumulative 
error term, e^, would tend to comply with the central limit 
theorem and approach normality. Since the error terms are 
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assumed to be normally distributed, the assumption of no 
correlation between residuals becomes one of independence. 
Still yet, the assumption of normality allows one to perform 
some parametric statistical tests in evaluating the 
statistical significance of the estimated parameters and the 
aptness of the model . 

B. NON-LINEAR REGRESSION 

Non-linear regression software in STATGRAPHICS [Ref. 18: 
pp . 19-35] is used as an alternative method of parameter 

estimation. In this procedure, least squares estimates of 
the parameters of a non-linear model are determined. The 

learning curve formulations in this study are inherently 
non-linear when the data are in their raw form. The non- 
linear model is: 

Y^ = aX^ + for i = 1, 2, 3, ... 

where 

Y^: observed response of the l trial 

t h 

X^: level of the independent variable of i trial 

a,b: regression parameters 

2 

residuals which are distributed N(0, o ) 

The non-linear regression method utilized in the 
STATGRAPHICS software was developed by D. W. Marquardt and 
represents a compromise between the linearization (Taylor 
series) method and the steepest descent method of non-linear 
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parameter estimation. Marquardt's compromise has been 
described as combining the best features of the lineariza- 
tion and steepest descent methods while avoiding their most 
serious limitations. A detailed discussion and references 
for this algorithm are contained in Draper and Smith's 
Applied Regression Analysis , Second Edition [Ref. 19: 
p. 471]. An important aspect of non-linear regression that 
deviates from the linear case is worth mentioning. When the 
error term of the non-linear model is assumed to be normally 
distributed, the parameter estimates are no longer normally 
distributed and the sample residual variance is no longer an 
unbiased estimate of the residual variance. While suitable 
comparison of mean squares can be made visually, the usual 
F-tests for regression and lack of fit are not valid, in 
general, for the non-linear case [Ref. 19:p. 484]. 

C. DATA ANALYSIS 

Examination of the observed residuals of a regression 
model is an important aspect of any regression technique. 
If the model is appropriate, the observed residuals should 
reflect the properties assumed for the error term in the 
regression model. In this study, both graphical and 
statistical tests involving the residuals will be performed. 
Evaluation of the residuals of the various models to be 
considered will address possible departures from the model 

ftr 

including: the regression model does not hold, the error 
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terms do not have constant variance, the error terms are not 
independent, the model fits all but one or a few outliers, 
and the error terms are not normally distributed. 

After fitting a model to the data, residuals falling 
into a horizontal band centered at zero displaying no 
systematic tendencies to be positive or negative and 
appearing to be randomly scattered would suggest the 
assumptions of the model do not appear to be violated. This 
would imply the model is well suited to the data. If this 
is not the case, remedial measures would need to be taken. 
Generally speaking, there are two types of remedial measures 
that are normally followed: abandon the model altogether or 

use some transformation on the data so the model is appro- 
priate for the transformed data. In this report, only two 
aspects of data transformation will be reckoned with: 
autocorrelation and the handling of outliers. When these 
two problems are dealt with and further residual analysis 
clearly implies the assumptions of the model are not met, 
the model will be rejected. 

1 . Autocorrelation 

The regression models of ordinary least squares or 
maximum likelihood techniques consider the stochastic 
disturbance terms, the residuals of the regression, to be 
either uncorrelated or independent normal random variables. 
In the application of regression models to learning curves. 



43 



we use time series data. The assumption of no correlation 
or independence between error terms for time series data is 
often inappropriate. The observed correlation between 
residuals of regression modeling is called autocorrelation 
or serial correlation. 

Neter and Wasserman outline the problems associated 
with autocorrelation: 

i) The regular least squares regression coefficients are 
still unbiased but no longer have the minimum 
variance property and may be quite inefficient. 

ii) The mean squared error (MSE) may seriously 
underestimate the variance of the error terms. 

iii) The estimated standard deviation of the regressio^ 
coefficients may be seriously underestimated and R 
may be overestimated. 

iv) The confidence intervals and tests using the 
student's t and F distributions are no longer 
strictly applicable. [Ref. 20:p. 352] 

In this study, the existence of first order auto- 
correlation, AR [1], will be investigated graphically and 
will be statistically tested using the Durb i n-Watson test. 
If autocorrelation indeed exists after examination of the 
residuals, this information will be used' to improve the 
regression model. The autocorrelation will be modeled and 
accounted for in a transformation of the model data. 

The first-order autocorrelation error model 
discussed by Neter and Wasserman [Ref. 20:p. 353] for a 

simple linear regression is: 
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